fix(smithy-json): escape control characters in write_string per RFC 8259 §7#647
Open
jason-weddington wants to merge 2 commits intosmithy-lang:developfrom
Open
Conversation
…259 §7 StreamingJSONEncoder.write_string() only escaped backslash and double quote. Control characters U+0000–U+001F (newline, tab, CR, etc.) were written as raw bytes, producing invalid JSON that causes SerializationException on API calls with multi-line string fields. Use a regex to escape all control characters: named escapes for common ones (\n, \r, \t, \b, \f) and \uXXXX for the rest.
jonathan343
reviewed
Feb 27, 2026
Contributor
jonathan343
left a comment
There was a problem hiding this comment.
Thanks @jason-weddington!
This looks good to me. Can you add a changelog entry for us so this gets included when we do our next release? You can achieve this by running a command similar to the following:
./scripts/changelog/new-entry.py -t enhancement -p smithy-json -d "Fixed string serialization to escape all control characters (U+0000-U+001F) per [RFC 8259](https://www.rfc-editor.org/rfc/rfc8259#section-7), preventing invalid JSON output for multiline and other control-character-containing strings. ([#647](https://github.com/smithy-lang/smithy-python/pull/647))"
Author
done! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title
fix(smithy-json): escape control characters in write_string per RFC 8259 §7
Body
Summary
StreamingJSONEncoder.write_string()only escapes\and". Control characters U+0000–U+001F (\n,\t,\r, etc.) are written as raw bytes, producing invalid JSON.This produces invalid JSON per RFC 8259 §7, causing
SerializationException(HTTP 400) on any API call where a string field contains these characters.Root Cause
serializers.py,write_string():Missing escapes for
\n,\r,\t,\b,\f, and other U+0000–U+001F characters required by RFC 8259 §7.Fix
Replace the inline
.replace()chain with a regex-based_escape_string()that handles:\n,\r,\t,\b,\f(plus existing\\and\")\uXXXXfor remaining U+0000–U+001FTests
JSON_SERDE_CASESTestStringControlCharEscapingclass with:json.loadsvalidationAll 126 tests pass (106 existing + 20 new).
Impact
Affects all services using
smithy-jsonfor request serialization. Any string field containing control characters produces invalid JSON. For services where string fields commonly contain newlines — such as the Bedrock Runtime Converse API — this is the primary failure path.Prior Art
The Rust implementation in
smithy-rsalready handles this correctly inescape.rs:This PR brings the Python implementation in line with the Rust behavior.
Reproduction